On the Complexity of Learning Decision Trees
نویسنده
چکیده
Various factors a ecting decision tree learning time are explored. The factors which consistently a ect accuracy are those which directly or indirectly (as in the handling of continuous attributes) allow a greater variety of potential trees to be explored. Other factors, e.g., pruning and choice of heuristics, generally have little e ect on accuracy, but signi cantly a ect learning time. We prove that the time complexity of induction and post-processing is exponential in tree height in the worst case and, under fairly general conditions, in the average case. This puts a premium on designs which produce shallower and more balanced trees. Simple pruning is linear in tree height, contrasted to the exponential growth of more complex operations. The key factor in uencing whether simple pruning will su ce is that the split selection and pruning heuristics should be the same and unbiased. The information gain and 2 tests are biased towards unbalanced splits, and neither is admissible for pruning. Empirical results show that the hypergeometric function can be used for both split selection and pruning, and that the resulting trees are simpler, more quickly learned, and no less accurate than trees resulting from other heuristics and more complex post-processing.
منابع مشابه
بررسی کارایی مدل درختان تصمیمگیری در برآورد رسوبات معلق رودخانهای (مطالعه موردی: حوضه سد ایلام)
The real estimation of the volume of sediments carried by rivers in water projects is very important. In fact, achieving the most important ways to calculate sediment discharge has been considered as the objective of the most research projects. Among these methods, the machine learning methods such as decision trees model (that are based on the principles of learning) can be presented. Decision...
متن کاملدستهبندی دادههای دوردهای با ابرمستطیل موازی محورهای مختصات
One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised algorithms is available such as decision tress, SVM, and KNN methods. In this paper we focus on decision tree algorithms. When we ...
متن کاملMinimization of Decision Trees Is Hard to Approximate
Decision trees are representations of discrete functions with widespread applications in, e.g., complexity theory and data mining and exploration. In these areas it is important to obtain decision trees of small size. The minimization problem for decision trees is known to be NP-hard. In this paper the problem is shown to be even hard to approximate up to any constant factor.
متن کاملCS 880 : Advanced Complexity Theory 2 / 11 / 2008 Lecture 8 : Active Learning
Last time we studied computational learning theory and saw how harmonic analysis could be used to design and analyze efficient learning algorithms with respect to the uniform distribution. We developed a generic passive learning algorithm for concepts whose Fourier spectrum is concentrated on a known set, and applied it to decision trees. We also started developing an approach for the case wher...
متن کاملLearning Nested Halfspaces and Uphill Decision Trees
Predicting class probabilities and other real-valued quantities is often more useful than binary classification, but comparatively little work in PAC-style learning addresses this issue. We show that two rich classes of real-valued functions are learnable in the probabilisticconcept framework of Kearns and Schapire. Let X be a subset of Euclidean space and f be a real-valued function on X. We s...
متن کاملLearning Small Trees and Graphs that Generalize
In this Thesis we study issues related to learning small tree and graph formed classifiers. First, we study reduced error pruning of decision trees and branching programs. We analyze the behavior of a reduced error pruning algorithm for decision trees under various probabilistic assumptions on the pruning data. As a result we get, e.g., new upper bounds for the probability of replacing a tree t...
متن کامل